Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio

نویسندگان

  • Kentaro Ishizuka
  • Tomohiro Nakatani
  • Masakiyo Fujimoto
  • Noboru Miyazaki
چکیده

This paper proposes a front-end processing method for automatic speech recognition (ASR) that employs a voice activity detection (VAD) method based on the periodic to aperiodic component ratio (PAR). The proposed VAD method is called PARADE (PAR based Activity DEtection). By considering the powers of the periodic and aperiodic components of the observed signals simultaneously, PARADE can detect speech segments more precisely in the presence of noise than conventional VAD methods. In this paper, PARADE is applied to a front-end processing technique that employs a robust feature extraction method called SPADE (Subband based Periodicity and Aperiodicity DEcomposition). The noisy ASR performance was examined with the CENSREC-1-C database, which includes connected continuous digit speech utterances drawn from CENSREC-1 (Japanese version of AURORA-2). The result shows that the SPADE front-end combined with PARADE achieves average word accuracy of 74.22 % at signal to noise ratios of 0 to 20 dB. This accuracy is significantly higher than that achieved by the ETSI ES 202 050 front-end (63.66 %) and the SPADE front-end without PARADE (64.28 %). This result also confirmed that PARADE can improve the performance of front-end processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study of noise robust voice activity detection based on periodic component to aperiodic component ratio

This paper describes a study of noise robust voice activity detection (VAD) utilizing the periodic component to aperiodic component ratio (PAR). Although environmental sound changes dynamically in the real world, conventional noise robust features for VAD are sensitive to the non-stationarity of noise, which yields variations in the signal to noise ratio, and sometimes requires apriori noise po...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

A study of mutual front-end processing method based on statistical model for noise robust speech recognition

This paper addresses robust front-end processing for automatic speech recognition (ASR) in noise. Accurate recognition of corrupted speech requires noise robust front-end processing, e.g., voice activity detection (VAD) and noise suppression (NS). Typically, VAD and NS are combined as one-way processing, and are developed independently. However, VAD and NS should not be assumed to be independen...

متن کامل

Study of integration of statistical model-based voice activity detection and noise suppression

This paper addresses robust front-end processing for automatic speech recognition (ASR) in noisy environments. To recognize the corrupted speech accurately, it is necessary to employ robust methods against various types of interference. Usually, noise suppression (NS) is used for the front-end processing of ASR in noise. Voice activity detection (VAD) is also used for front-end processing to re...

متن کامل

Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling

In this paper a noise robust feature extraction algorithm using joint wavelet packet decomposition (WPD) and an autoregressive (AR) modeling of the speech signal is presented. In opposition to the short time Fourier transform (STFT) based time-frequency signal representation, a computationally efficient WPD can lead to better representation of non-stationary parts of the speech signal (consonan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007